34 research outputs found
Recommended from our members
Learning to See with Minimal Human Supervision
Deep learning has significantly advanced computer vision in the past decade, paving the way for practical applications such as facial recognition and autonomous driving. However, current techniques depend heavily on human supervision, limiting their broader deployment. This dissertation tackles this problem by introducing algorithms and theories to minimize human supervision in three key areas: data, annotations, and neural network architectures, in the context of various visual understanding tasks such as object detection, image restoration, and 3D generation.
First, we present self-supervised learning algorithms to handle in-the-wild images and videos that traditionally require time-consuming manual curation and labeling. We demonstrate that when a deep network is trained to be invariant to geometric and photometric transformations, representations from its intermediate layers are highly predictive of object semantic parts such as eyes and noses. This insight offers a simple unsupervised learning framework that significantly improves the efficiency and accuracy of few-shot landmark prediction and matching. We then present a technique for learning single-view 3D object pose estimation models by utilizing in-the-wild videos where objects turn (e.g., cars in roundabouts). This technique achieves competitive performance with respect to existing state-of-the-art without requiring any manual labels during training. We also contribute an Accidental Turntables Dataset, containing a challenging set of 41,212 images of cars in cluttered backgrounds, motion blur, and illumination changes that serve as a benchmark for 3D pose estimation.
Second, we address variations in labeling styles across different annotators, which leads to a type of noisy label referred to as heterogeneous label. This variability in human annotation can cause subpar performance during both the training and testing phases. To mitigate this, we have developed a framework that models the labeling styles of individual annotators, reducing the impact of human annotation variations and enhancing the performance of standard object detection models. We have also applied this framework to analyze ecological data, which are often collected opportunistically across different case studies without consistent annotation guidelines. Through this application, we have obtained several insightful observations into large-scale bird migration behaviors and their relationship to climate change.
Our next study explores the challenges of designing neural networks, an area that lacks a comprehensive theoretical understanding. By linking deep neural networks with Gaussian processes, we propose a novel Bayesian interpretation of the deep image prior, which parameterizes a natural image as the output of a convolutional network with random parameters and random input. This approach offers valuable insights to optimize the design of neural networks for various image restoration tasks.
Lastly, we introduce several machine-learning techniques to reconstruct and edit 3D shapes from 2D images with minimal human effort. We first present a generic multi-modal generative model that bridges 2D images and 3D shapes via a shared latent space, and demonstrate its applications on versatile 3D shape generation and manipulation tasks. Additionally, we develop a framework for joint estimation of 3D neural scene representation and camera poses. This approach outperforms prior works and allows us to operate in the general SE(3) camera pose setting, unlike the baselines. The results also indicate this method can be complementary to classical structure-from-motion (SfM) pipelines as it compares favorably to SfM on low-texture and low-resolution images
Accidental Turntables: Learning 3D Pose by Watching Objects Turn
We propose a technique for learning single-view 3D object pose estimation
models by utilizing a new source of data -- in-the-wild videos where objects
turn. Such videos are prevalent in practice (e.g., cars in roundabouts,
airplanes near runways) and easy to collect. We show that classical
structure-from-motion algorithms, coupled with the recent advances in instance
detection and feature matching, provides surprisingly accurate relative 3D pose
estimation on such videos. We propose a multi-stage training scheme that first
learns a canonical pose across a collection of videos and then supervises a
model for single-view pose estimation. The proposed technique achieves
competitive performance with respect to existing state-of-the-art on standard
benchmarks for 3D pose estimation, without requiring any pose labels during
training. We also contribute an Accidental Turntables Dataset, containing a
challenging set of 41,212 images of cars in cluttered backgrounds, motion blur
and illumination changes that serves as a benchmark for 3D pose estimation.Comment: Project website: https://people.cs.umass.edu/~zezhoucheng/acci-turn
Numb regulates cellācell adhesion and polarity in response to tyrosine kinase signalling
Epithelial-mesenchymal transition (EMT), which can be caused by aberrant tyrosine kinase signalling, marks epithelial tumour progression and metastasis, yet the underlying molecular mechanism is not fully understood. Here, we report that Numb interacts with E-cadherin (E-cad) through its phosphotyrosine-binding domain (PTB) and thereby regulates the localization of E-cad to the lateral domain of epithelial cellācell junction. Moreover, Numb engages the polarity complex Par3āaPKCāPar6 by binding to Par3 in polarized Madin-Darby canine kidney cells. Intriguingly, after Src activation or hepatocyte growth factor (HGF) treatment, Numb decouples from E-cad and Par3 and associates preferably with aPKCāPar6. Binding of Numb to aPKC is necessary for sequestering the latter in the cytosol during HGF-induced EMT. Knockdown of Numb by small hairpin RNA caused a basolateral-to-apicolateral translocation of E-cad and Ī²-catenin accompanied by elevated actin polymerization, accumulation of Par3 and aPKC in the nucleus, an enhanced sensitivity to HGF-induced cell scattering, a decrease in cellācell adhesion, and an increase in cell migration. Our work identifies Numb as an important regulator of epithelial polarity and cellācell adhesion and a sensor of HGF signalling or Src activity during EMT
Detecting and Tracking Communal Bird Roosts in Weather Radar Data
The US weather radar archive holds detailed information about biological
phenomena in the atmosphere over the last 20 years. Communally roosting birds
congregate in large numbers at nighttime roosting locations, and their morning
exodus from the roost is often visible as a distinctive pattern in radar
images. This paper describes a machine learning system to detect and track
roost signatures in weather radar data. A significant challenge is that labels
were collected opportunistically from previous research studies and there are
systematic differences in labeling style. We contribute a latent variable model
and EM algorithm to learn a detection model together with models of labeling
styles for individual annotators. By properly accounting for these variations
we learn a significantly more accurate detector. The resulting system detects
previously unknown roosting locations and provides comprehensive
spatio-temporal data about roosts across the US. This data will provide
biologists important information about the poorly understood phenomena of
broad-scale habitat use and movements of communally roosting birds during the
non-breeding season.Comment: 9 pages, 6 figures, AAAI 2020 (AI for Social Impact Track
A Lithium-Ion Pump Based on Piezoelectric Effect for Improved Rechargeability of Lithium Metal Anode.
Lithium metal is widely studied as the "crown jewel" of potential anode materials due to its high specific capacity and low redox potential. Unfortunately, the Li dendrite growth limits its commercialization. Previous research has revealed that the uniform Li-ion flux on electrode surface plays a vital role in achieving homogeneous Li deposition. In this work, a new strategy is developed by introducing a multifunctional Li-ion pump to improve the homogenous distribution of Li ions. Via coating a Ī²-phase of poly(vinylidene fluoride) (Ī²-PF) film on Cu foil (Cu@Ī²-PF), a piezoelectric potential across such film is established near the electrode surface because of its piezoelectric property, which serves as a driving force to regulate the migration of Li ions across the film. As a result, uniform Li-ion distribution is attained, and the Cu@Ī²-PF shows coulombic efficiency around 99% throughout 200 cycles. Meanwhile, the lithium-sulfur full cell paired with Li-Cu@Ī²-PF anode exhibits excellent performance. This facile strategy via regulating the Li-ion migration provides a new perspective for safe and reliable Li metal anode